For my data replication assignment I decided to replicated analyses from “Effects of Female Group Size on the Number of Males in Blue Monkey (Cercopithecus mitis) Groups” by Lu Gao & Marina Cords. Blue monkeys normally live in a group setting with multiple females but not multiple males. During mating and conception periods males may join the group for a time to mate with females before leave. This paper investigated how well female group size of the blue monkeys could predict the movement of males in and out of the groups specifically during mating and conception periods. They specifically studied 8 different groups within Kakamega Forest in Kenya from 2006-2014. The data set that I used was sourced from the corresponding dryad page. I determined that I would only need the first data set as all my analyses would correspond to conception period.
Looking specially at conception periods Gao & Cords used a binomal family generalized linear mixed models(glmms) to determined the relationship between female group size as a predictor for number of males and sexually active females within the group during conception periods. Using the glmm and the data from each conception period they created a scatterplot that graphed the glmm line onto the data. As well they compiled a table that contained information about each group stating the number of days observed, the mean group size, and proportion of multimale days. And then separating out just observed day during the conception periods they calculated the number of day, and proportion of days with multiple males. For this assignment I will be replicating all calculations within table 1, the two glmms(multi males and sexually active females), and the graph.
library(curl)
library(tidyverse)
library(lubridate, warn.conflicts = FALSE)
library(broom.mixed)
library(ggplot2)
library(gridExtra)
library(tidyr)
library(MCMCglmm)
library(lme4)
library(ggplot2)
Using dryad I was able to find the cooresponding data for the paper. There were two excel data sheets and I put both in my project folder and uploaded them to github. The second dataset sheet contained information about individuals within the group which I would not be using for my analyzes. I thus decided to not import it but it is still within the project file just incase. Once uploaded to github it was easy to curl in datasheet 1.
f <- curl("https://raw.githubusercontent.com/rhottensomers/reesehs-data-replication-assignment/main/doi_10.5061_dryad.kkwh70s31__v4%202/Gao_Cords_2020_IJP_Dataset1_Daily_data.csv") #imports data from paper dataset 1
d1 <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = FALSE) #reads data and creates data frame from it
head(d1, 4) #returns first 4 rows of data frame
## Date Group N.sex.active.females N.males.in.group Mating.season
## 1 1/1/06 Gn ND ND NO
## 2 1/2/06 Gn 0 1 NO
## 3 1/3/06 Gn 2 1 NO
## 4 1/4/06 Gn 0 1 NO
## Conception.period Female.Group.Size
## 1 NO 12
## 2 NO 12
## 3 NO 12
## 4 NO 12
This datasheet contains data from each observation day, of which there were 16,719. Each data entry contains the group, number of sexually active females and males in the group, whether it was a mating season or conception period, and the female group size. After importing I needed to assess how the data was structured and manipulate it so it would be ready of analyses.
str(d1) #structure of dataframe
## 'data.frame': 17337 obs. of 7 variables:
## $ Date : chr "1/1/06" "1/2/06" "1/3/06" "1/4/06" ...
## $ Group : chr "Gn" "Gn" "Gn" "Gn" ...
## $ N.sex.active.females: chr "ND" "0" "2" "0" ...
## $ N.males.in.group : chr "ND" "1" "1" "1" ...
## $ Mating.season : chr "NO" "NO" "NO" "NO" ...
## $ Conception.period : chr "NO" "NO" "NO" "NO" ...
## $ Female.Group.Size : int 12 12 12 12 12 12 12 12 12 12 ...
#r does not understhand ND so I converted all NDs to NAs within the dataframe
d1$N.sex.active.females <- na_if(d1$N.sex.active.females,"ND")
d1$N.males.in.group <- na_if(d1$N.males.in.group,"ND")
d1$Mating.season <- na_if(d1$Mating.season, "ND")
#converts mating season and conception periods into factored values
d1$Mating.season <- as.factor(d1$Mating.season)
d1$Conception.period <- as.factor(d1$Conception.period)
#converting in numbers instead of characters as it will be easier to use with anlayses
d1$N.sex.active.females <- as.numeric(d1$N.sex.active.females)
d1$N.males.in.group <- as.numeric(d1$N.males.in.group)
#using lubridate to convert the dates into month, day, year formula
d1$Date <- mdy(d1$Date)
#check that each column is in desired structure
str(d1)
## 'data.frame': 17337 obs. of 7 variables:
## $ Date : Date, format: "2006-01-01" "2006-01-02" ...
## $ Group : chr "Gn" "Gn" "Gn" "Gn" ...
## $ N.sex.active.females: num NA 0 2 0 1 0 0 1 NA NA ...
## $ N.males.in.group : num NA 1 1 1 1 1 1 1 NA NA ...
## $ Mating.season : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
## $ Conception.period : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
## $ Female.Group.Size : int 12 12 12 12 12 12 12 12 12 12 ...
head(d1, 1)
## Date Group N.sex.active.females N.males.in.group Mating.season
## 1 2006-01-01 Gn NA NA NO
## Conception.period Female.Group.Size
## 1 NO 12
Now that the data was in my desired structure I decided to do an initial analysis regarding the number of group observational days that were multimale days. The article stated that it was 15% of days, however, my answer differed slightly.
#finds the number of days where the number of males in group is greater than or equal to two which removing nas
mmd <- sum(d1$N.males.in.group >= 2, na.rm = TRUE)
mmd/nrow(d1) * 100 #finds percent of days with multiple male
## [1] 14.61614
While 14.616% is slightly lower than the expected value, it is not by much. Most likely there was data that was excluded but not reported. Or it was rounding.
The determining factors of the analyses that will be run rely on knowing if there are multiple males or multiple sexually active female an each day of data. The data has the number of each factor in their own column, N.males.in.group and N.sex.active.females, however, I needed to divide the numbers into two groups: those less than two and those equal to or above two(how the authors describe multiple). In order to acomplish this I used tidverse bins to separate the numbers. This created two new columns: Mbin and Fbin that label whether that day contains 2(+) or <2 of each factor.
d1 <- d1 %>% mutate(Mbin = case_when(N.males.in.group < 2 ~ "<2",N.males.in.group >= 2 ~ "2+"))
d1 <- d1 %>% mutate(Fbin = case_when(N.sex.active.females < 2 ~ "<2F",N.sex.active.females >= 2 ~ "2+F"))
head(d1)
## Date Group N.sex.active.females N.males.in.group Mating.season
## 1 2006-01-01 Gn NA NA NO
## 2 2006-01-02 Gn 0 1 NO
## 3 2006-01-03 Gn 2 1 NO
## 4 2006-01-04 Gn 0 1 NO
## 5 2006-01-05 Gn 1 1 NO
## 6 2006-01-06 Gn 0 1 NO
## Conception.period Female.Group.Size Mbin Fbin
## 1 NO 12 <NA> <NA>
## 2 NO 12 <2 <2F
## 3 NO 12 <2 2+F
## 4 NO 12 <2 <2F
## 5 NO 12 <2 <2F
## 6 NO 12 <2 <2F